Applying Rule Induction in Software Prediction
نویسنده
چکیده
Recently, the use of machine learning (ML) algorithms has proven to be of great practical value in solving a variety of software engineering problems including software prediction, for example, cost and defect processes. An important advantage of machine learning over statistical analysis as a modelling technique lies in the fact that the interpretation of production rules is more straightforward and intelligible to human beings than, say, principal components and patterns with numbers that represent their meaning. The main focus of this chapter is upon rule induction (RI): providing some background and key issues on RI and further examining how RI has been utilised to handle uncertainties in data. Application of RI in prediction and other software engineering tasks is considered. The chapter concludes by identifying future research work when applying rule induction in software prediction. Such future research work might also help solve new problems related to rule induction and prediction. IGI PUBLISHING This paper appears in the publication, Advances in Machine Learning Applications in Software Engineering edited by Du Zhang & Jeffrey J.P. Tsa © 2007, IGI Global 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.igi-pub.com ITB14917 266 Twala, Cartwright & Shepperd Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Introduction Machine learning (ML), which has been making great progress in many directions, is a hallmark of machine intelligence just as human learning is the hallmark of human intelligence. The ability to learn and reason from observations and experience seems to be crucial for any intelligent being (Forsyth & Rada, 1986; Holland, 1975; Winston, 1992). One major problem for applying ML algorithms in software engineering is the unavailability and scarcity of software data, that is, data for training the model. Surveys for collecting software engineering data are usually small but difficult and expensive to conduct. This is due to the lack of expertise with required knowledge to carry out and maintain high quality information and the nature of software development which cannot be collected in an environmental setting. Also, the lack of data could arise from the need for confidentiality—industrial companies are often reluctant to allow access to data on software failures because of the possibility that people might think less highly of their products. Another problem as far as ML application is concerned is the form that the dataset takes, that is, the dataset characteristics. Most learning techniques assume that the data are presented in a simple attribute-value format. Another important feature of a problem domain is the quality of the data available. Most real data is imperfect: incomplete; irrelevant; redundant; noisy; and erroneous. The aim of a learning system is to discover a set of decision rules that is complete, in that it describes all of the data and predicts the data accurately. In this chapter, we explore what RI can do in the software engineering domain to increase the awareness of learning methods when building software prediction models. However, our main focus is on the application of rule induction to software prediction. The topic is significant because: 1. RI is an emerging technology that can aid in the discovery of rules and patterns in sets of data. 2. RI has an advantage over statistical analysis as a modelling technique due to the fact that the interpretation of production rules is more straightforward and intelligible to human beings than, say, principal component and patterns with numbers that represent their meaning. 3. Due to the lack of adequate tools to evaluate and estimate software project estimation, RL strategies have been used to tackle such problems, including software prediction. Rule induction is one of the most established and effective data mining technology in use today that has been applied successfully in several disciplines and to real-world domains. These include: preventing breakdowns in electric transformers (Riese, 1984); increasing yield in chemical process control (Leech, 1986); improving separation of gas from oil (Guilfoyle, 1986); making credit card decisions (Michie, 1989); diagnosis of mechanical devices (Giordana, Neri, & Saitta, 1996); monitoring quality of rolling emulsions (Karba & Drole, 1990); coronary heart disease diagnosis and risk group delivery (Gamberger, Lavrac, & Krstacic, 2002); categorising text documents (Johnson, Oles, Zhang, & Goetz, 2002); and among other areas. 20 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/chapter/applying-rule-induction-softwareprediction/4864?camid=4v1 This title is available in InfoSci-Books, InfoSci-Software Technologies, Business-Technology-Solution, Science, Engineering, and Information Technology, InfoSci-Select, InfoSci-Computer Science and Information Technology. Recommend this product to your librarian: www.igi-global.com/e-resources/library-recommendation/?id=1
منابع مشابه
Combining techniques to optimize effort predictions in software project management
This paper tackles two questions related to software effort prediction. First, is it valuable to combine prediction techniques? Second, if so, how? Many commentators have suggested the use of more than one technique in order to support effort prediction, but to date there has been little or no empirical investigation to support this recommendation. Our analysis of effort data from a medical rec...
متن کاملDetermination of Algorithms Making Balance Between Accuracy and Comprehensibility in Churn Prediction Setting
Predictive modeling is a useful tool for identifying customers who are at risk of churn. An appropriate churn prediction model should be both accurate and comprehensible. However, reviewing the past researches in this context shows that much attention is paid to accuracy of churn prediction models than comprehensibility of them. This paper compares three different rule induction techniques from...
متن کاملSoftware Verification Using k-Induction Extended version including appendix with proofs
We present combined-case k-induction, a novel technique for verifying software programs. This technique draws on the strengths of the classical inductive-invariant method and a recent application of k-induction to program verification. In previous work, correctness of programs was established by separately proving a base case and inductive step. We present a new k-induction rule that takes an u...
متن کاملThe artificial neural networks model for software effort estimation
Machine learning techniques such as neural networks, rule induction, genetic algorithm and case-based reasoning are finding application in a wide variety of fields such as computer vision, econometrics and medicine, where human abilities have proven to be superior to those of computers. Such techniques hold the promise of being able to make sense of a variety of inputs of different types in pro...
متن کاملApplying Reliability Engineering to Expert Systems
Often a rule-based system is tested by checking its performance on a number of test cases with known solutions, modii~’ing the system until it gives the correct results ibr all or a sufficiently high proportion of the test cases. However, the performance on the test cases may not accurately predict performance of the system in actual use. In this paper we discuss why this testing method does no...
متن کامل